Публикации / Publications

36 thoughts on “Публикации / Publications”

Evgenii says:

September 11, 2023 at 8:46 pm

https://practicum.yandex.ru/profile/ycloud-loadtesting/
Нагрузочное тестирование 0-100

Reply
Evgenii says:

September 13, 2023 at 12:56 pm

sudo gedit /etc/apt/sources.list.d/waydroid.list
➜ ~ sudo gedit /etc/apt/sources.list.d/intel.list

Reply
Evgenii says:

September 13, 2023 at 7:57 pm

> 72 p

Reply
- Evgenii says:
  
  October 5, 2023 at 8:53 pm
  
  162
  
  Reply
Evgenii says:

October 11, 2023 at 6:50 am

&& apt-get install -y curl python3-pip python3-dev unixodbc-dev unixodbc graphviz libgl1-mesa-glx libpq-dev gcc

Reply
Evgenii says:

October 11, 2023 at 12:39 pm

RUN python -c “import nltk; nltk.download(‘stopwords’); nltk.download(‘punkt’); nltk.download(‘averaged_perceptron_tagger’)”

Reply
Evgenii says:

October 11, 2023 at 12:45 pm

https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/tokenizers/punkt.zip

Reply
Evgenii says:

October 27, 2023 at 6:45 am

https://cor.sgh.waw.pl/handle/20.500.12182/1176

Reply
Evgenii says:

October 29, 2023 at 2:43 pm

https://journal.tinkoff.ru/brat-za-brata/

Reply
Evgenii says:

November 1, 2023 at 11:50 am

https://ray.so/

Reply
Evgenii says:

November 1, 2023 at 11:50 am

https://roadmap.sh/

Reply
Evgenii says:

November 1, 2023 at 11:50 am

https://codepen.io/

Reply
Evgenii says:

November 1, 2023 at 11:50 am

https://readme.com/

Reply
Evgenii says:

November 1, 2023 at 11:51 am

https://www.gitbook.com/

Reply
Evgenii says:

November 1, 2023 at 6:18 pm

Мы можем извлекать пользу и на статическом этапе и на
динамическом (fastapi, pydantic, msgspec, beartype, swagger)

— Зачастую есть производительные аналоги пакетов в языке (pandas -> polars,
uvicorn -> granian, fastapi -> (расскажу в другом докладе), pydantic v2 -> msgspec)

Reply
Evgenii says:

November 3, 2023 at 11:37 am

zdcxcv

Reply
Evgenii says:

November 16, 2023 at 11:54 am

k8 best practiecs

Reply
Evgenii says:

December 9, 2023 at 8:59 am

import pandas as pd
import numpy as np

# Read in the Excel file
df = pd.read_excel(‘file.xlsx’)

# Convert the ‘date’ column to datetime
df[‘date’] = pd.to_datetime(df[‘date’], format=’%d.%m.%Y’)

# Sort the dataframe by date
df.sort_values(‘date’, inplace=True)

# List of holidays (modify this according to your needs)
holidays = [‘2019-12-25’, ‘2020-01-01’]

# Calculate the difference in business days between each operation
df[‘business_days_between_operations’] = df[‘date’].apply(
lambda x: np.busday_count(x.strftime(‘%Y-%m-%d’),
df[‘date’].shift(-1).strftime(‘%Y-%m-%d’),
holidays=holidays)
)

# Resample the data to get the maximum number of business days between operations for each month
max_business_days_between_operations = df.resample(‘M’, on=’date’)[‘business_days_between_operations’].max()

# Convert the result to a dictionary
result = max_business_days_between_operations.to_dict()

# Convert dictionary to JSON
import json
json_result = json.dumps(result)

print(json_result)

Reply
Evgenii says:

December 13, 2023 at 11:57 am

calendar

Reply
Evgenii says:

December 14, 2023 at 11:55 am

iac

Reply
Evgenii says:

December 15, 2023 at 8:10 am

To automatically select the date format from a list of candidates when reading a DataFrame from an Excel file, you can use the pandas library in Python. Here’s a code snippet that demonstrates this:

import pandas as pd

# List of date formats to check
date_formats = [‘%m/%d/%Y’, ‘%d/%m/%Y’, ‘%Y-%m-%d’]

# Read the Excel file into a DataFrame
df = pd.read_excel(‘your_file.xlsx’)

# Iterate over each column in the DataFrame
for col in df.columns:
# Iterate over each date format candidate
for date_format in date_formats:
try:
# Attempt to convert the column to datetime using the current format
df[col] = pd.to_datetime(df[col], format=date_format)
# If successful, break the loop and move to the next column
break
except ValueError:
pass

# Print the updated DataFrame with converted dates
print(df)

In this code snippet, we first define a list of date formats (date_formats) that you want to check for each column. Then, we read the Excel file into a DataFrame using pd.read_excel(). Next, we iterate over each column in the DataFrame and try to convert it to datetime using each date format candidate. If the conversion is successful, we break the loop and move on to the next column. Finally, we print the updated DataFrame with converted dates.

Note: Make sure to replace ‘your_file.xlsx’ with the actual path and filename of your Excel file.

Reply
Evgenii says:

December 15, 2023 at 12:30 pm

Sure! To handle a column with mixed date formats, you can modify the previous code snippet to check each value in the column individually and attempt to convert it using each date format candidate. Here’s an updated version of the code:

import pandas as pd

# List of date formats to check
date_formats = [‘%m/%d/%Y’, ‘%d/%m/%Y’, ‘%Y-%m-%d’]

# Read the Excel file into a DataFrame
df = pd.read_excel(‘your_file.xlsx’)

# Iterate over each column in the DataFrame
for col in df.columns:
# Iterate over each value in the column
for i, value in enumerate(df[col]):
# Iterate over each date format candidate
for date_format in date_formats:
try:
# Attempt to convert the value to datetime using the current format
df[col][i] = pd.to_datetime(value, format=date_format)
# If successful, break the loop and move to the next value
break
except ValueError:
pass

# Print the updated DataFrame with converted dates
print(df)

In this updated code, we added an additional loop to iterate over each value in the column (for i, value in enumerate(df[col])). For each value, we try to convert it to datetime using each date format candidate. If the conversion is successful, we update the value in the DataFrame (df[col][i] = pd.to_datetime(value, format=date_format)) and break the loop to move on to the next value.

Note: Make sure to replace ‘your_file.xlsx’ with the actual path and filename of your Excel file.

Reply
Evgenii says:

January 9, 2024 at 11:08 am

As an experienced Python developer working on a machine learning classification service, optimizing data processing can provide significant speedups and potentially enhance robustness. Here are some upgrades you could consider:

1. **Efficient Data Structures**: Use pandas DataFrames for structured data. Consider utilizing `categorical` data types for categorical features, as they can be more memory- and speed-efficient.

2. **Vectorization**: Avoid loops where possible and leverage vectorized operations provided by NumPy, pandas, or similar libraries, which are designed to be fast and efficient.

3. **Use Efficient Libraries**: For numerical computation, libraries like NumPy and SciPy are optimized for performance. For machine learning, libraries like Scikit-learn, TensorFlow, or PyTorch offer optimized routines for data processing and model training.

4. **Concurrent Processing**: Utilize Python’s multiprocessing or concurrent.futures module to parallelize data processing tasks that are independent of each other.

5. **Optimize I/O Operations**: Reading and writing data can be a bottleneck. Use efficient formats like HDF5, Parquet, or Feather for storing and accessing your data. These formats are optimized for fast read and write operations and allow for selective loading of data.

6. **Memory Mapping**: For extremely large datasets that cannot fit into memory, use memory mapping (e.g., NumPy’s memmap function) to only load parts of the data into memory as needed.

7. **Incremental Learning**: If your dataset is too large to process at once, consider incremental learning methods, where the model is updated in batches, reducing memory overhead.

8. **Data Caching**: When working with data that require expensive preprocessing steps, consider caching the processed data on disk to avoid repeating these steps in subsequent runs.

9. **Profiling and Optimization**: Profile your code to identify bottlenecks. Use profiling tools like cProfile, line_profiler, or memory_profiler to find inefficiencies.

10. **Use Just-In-Time Compilation**: Try just-in-time (JIT) compilation provided by libraries such as Numba to compile performance-critical code sections into machine code.

11. **Algorithmic Improvements**: Review the algorithms and data-processing steps you’re using. Sometimes, using more efficient algorithms or data structures can result in significant speed improvements.

12. **Robust Error Handling**: When processing data, implement try-except blocks to handle potential data quality issues without crashing your processing pipeline.

13. **Data Quality Checks**: Implement robust preprocessing to handle missing data, outliers, and anomalies, which can improve both the speed (by preventing reprocessing) and the robustness of your ML service.

14. **Feature Engineering**: Intelligent feature selection can greatly reduce model complexity and data processing time, focusing only on the most informative features.

15. **Hardware Acceleration**: If you’re already optimized on the software end, consider using hardware acceleration (e.g., GPUs or TPUs) for data processing and model training.

16. **Distributed Processing**: For extremely large datasets or complex processing pipelines, consider a distributed processing framework like Apache Spark or Dask, which can handle data that exceeds a machine’s memory capacity and can distribute the workload across clusters.

By strategically implementing these upgrades, you can significantly speed up your data processing and increase the robustness of your machine learning classification service. The appropriateness of each of these upgrades will depend on the specific bottlenecks and performance issues you are encountering with your current setup.

Reply
Evgenii says:

January 9, 2024 at 11:10 am

https://pepa.holla.cz/wp-content/uploads/2016/08/High-Performance-Python.pdf

Reply
Evgenii says:

January 9, 2024 at 12:35 pm

If you’re dealing with deprecation warnings coming from within your virtual environment (for example, from packages installed in the `.venv` folder), you can still use the methods outlined previously, but you might want to be more specific in your filtering so that you don’t suppress warnings from your own code.

Here’s how you can modify the Python `warnings` filter to ignore warnings specifically from modules in the `.venv` folder:

1. **Using `warnings.filterwarnings` with a Specific Prefix:**
You can suppress warnings by specifying a pattern that matches the file paths of your virtual environment. You’ll often find that the file paths of the installed packages within a virtual environment start with the path to the virtual environment itself.

“`python
import warnings
import os
import re

# Get the path to the virtual environment
venv_path = os.path.join(os.path.dirname(__file__), ‘.venv’)

# Escape backslashes for Windows paths
venv_path = re.escape(venv_path)

# Suppress DeprecationWarnings coming from packages within the virtual environment
warnings.filterwarnings(
“ignore”,
category=DeprecationWarning,
module=r”{}.*”.format(venv_path)
)
“`

Replace `.venv` with the name of your virtual environment folder if it’s different.

2. **Using a Context Manager for a Specific Code Block:**

If you want to suppress the warnings only for a specific block of code and you know that the code will execute functions from the virtual environment, you can use a context manager:

“`python
import warnings

with warnings.catch_warnings():
warnings.filterwarnings(
“ignore”,
category=DeprecationWarning,
module=r”{}.*”.format(venv_path)
)
# Code that imports or uses modules from the virtual environment goes here
“`

Again, it’s important to note that you should use these techniques with care, as they might hide important warnings that could affect the future compatibility and functionality of your code. It’s always best to stay up-to-date with the libraries you’re using and refactor your code to avoid the use of deprecated features.

Reply
Evgenii says:

January 15, 2024 at 12:09 pm

# custom IntelliJ IDEA properties

editor.zero.latency.typing=true
idea.max.intellisense.filesize=3500
idea.cycle.buffer.size=4096

Reply
Evgenii says:

January 15, 2024 at 12:10 pm

-Xms6048m
-Xmx8048m
-Xss128m
-XX:NewRatio=2
-Dfile.encoding=UTF-8
-XX:SoftRefLRUPolicyMSPerMB=250
-XX:NewSize=1024m
-XX:MaxNewSize=1024m
-XX:PermSize=1024m
-XX:MaxPermSize=2048m
-XX:ParallelGCThreads=4
-XX:MaxTenuringThreshold=1
-XX:SurvivorRatio=8
-XX:+UseCodeCacheFlushin

Reply
Evgenii says:

January 16, 2024 at 8:44 pm

https://hakibenita.com/fast-excel-python

Reply
Evgenii says:

January 25, 2024 at 7:10 am

RUN wget http://download.cdn.yandex.net/mystem/mystem-3.1-linux-64bit.tar.gz && \
tar -zxvf mystem-3.1-linux-64bit.tar.gz && rm -rf mystem-3.1-linux-64bit.tar.gz && \
mv mystem ~/.local/bin

Reply
Evgenii says:

January 29, 2024 at 7:55 am

https://rusetfs.com/screener

Reply
Evgenii says:

February 13, 2024 at 12:05 pm

> Evgenii Varseev:
You are and experienced python developer. Your app recieves and well-formed xml file via message broker. But for debug you have only usual xml message. How would you convert conventional XML FILE to WELL-FORMED XML FILE. Please provide a snippet of code

> Нейрокот:
To convert a conventional XML file to a well-formed XML file, you can make use of the `xml.dom.minidom` module in Python. Here’s a snippet of code that demonstrates this conversion:

“`python
import xml.dom.minidom

def convert_to_well_formed_xml(file_path):
# Parse the conventional XML file
xml_doc = xml.dom.minidom.parse(file_path)

# Create an unformatted string representation of the XML document
xml_str = xml_doc.toprettyxml(indent=””)

# Remove leading/trailing whitespace and newlines
well_formed_xml_str = xml_str.strip()

return well_formed_xml_str

# Example usage
conventional_xml_file_path = “conventional.xml”

# Convert the conventional XML file to a well-formed XML string
well_formed_xml_str = convert_to_well_formed_xml(conventional_xml_file_path)

# Print the well-formed XML string to console
print(well_formed_xml_str)
“`

In this code snippet, we first parse the conventional XML file using `xml.dom.minidom.parse()`. Then, we use the `toprettyxml()` method to create a string representation of the XML document with indentation. Finally, we remove any leading/trailing whitespace and newlines to obtain the well-formed XML string.

Note: This code assumes that you have the `xml.dom.minidom` module available, which is part of Python’s standard library. If not, you may need to install it using `pip install xml.dom.minidom` before executing the code.

Reply
- Varsey says:
  
  March 26, 2024 at 7:13 am
  
  As an experienced Python developer, it’s important to follow consistent naming conventions to ensure clarity and readability in your code. While it is generally recommended to use verbs or verb phrases for method names to indicate an action or behavior, there are exceptions, and using a name without a verb can be acceptable in certain cases.
  
  In the context of checking whether a file is oversized, both naming conventions can be suitable depending on your preference and the overall context of your codebase. Here are some considerations for each option:
  
  1. Naming with a Verb:
  – If you choose a name like `check_if_file_is_oversized`, the method name explicitly conveys the action being performed (checking) and the condition being evaluated (file is oversized).
  – This naming style provides a clear indication of the method’s purpose and behavior, making it easier for other developers (or your future self) to understand the function at a glance.
  
  2. Naming without a Verb:
  – If you opt for a name like `file_is_oversized`, the method name serves as a descriptive statement about the state or condition of the file (oversized).
  – While there is no explicit verb indicating an action, the name still conveys information about the outcome of the check, which can be sufficient in certain cases, especially if the context where this method is called provides enough context.
  
  Ultimately, the choice between including a verb in the method name or using a name without a verb is a matter of personal preference and style guidelines within your codebase or organization. Just ensure that the chosen name accurately reflects the purpose and behavior of the method to maintain code clarity and readability. Whichever naming convention you choose, consistency and clarity are key for effective code maintenance and collaboration.
  
  Reply
Evgenii says:

April 1, 2024 at 7:57 am

https://u7321336.ct.sendgrid.net/ls/click?upn=u001.BYcj41kFcW2RhEcObgGUfEO2IUIIvMpWkoIcSj4obRzjc60DezkRH96-2BY-2F3YwlGflLs35ai7ZjsUNzJSB7Ut9g-3D-3DIIAZ_wN6rnJLou8XUyr9XdSMUAgyAkX-2B1IIS7Qf3pRKYCEhKn7hgZg-2FDLUdEvnwJ4tlrLVPSTfpklvW0A3Jn49nbPcG9BXQPvUSEQs-2FZyLjCwaPxuPn1JSdc6zJ357jh08hSLXmKsu0zAaqaQqh0o2fHVYrr3E8xMCpyfCKJ9x1hdX0h-2BFgJHGfXnIaYwB2tDkEnIhQ81AVjGKacGJ1Aj93MbkAEQCk2d2G8t0jaENf4LUl7EGBXDgFoRTfFeE3DDh3TJfvMPUQC4tdP-2F0FpyaHEX3IlHvXX59cevvW2g1DX0XhiXlxymAyW-2FLCHWEbnjI7vNY-2Fye8IlrqryjGGzMyfxdVvyqw1fMcUy2K1oKpBizFZQhpoEO7wTRn9ixiOGAxBw9J1De8bE5gPGe4Q26C-2BaBW8wirDUBaKmWiA76sp-2FuvHXLFFynMI8DSNxdm6xv6OoBthqOQLsyOvnp-2FNZiiPS9Nw-3D-3D

Reply
Evgenii says:

April 2, 2024 at 12:43 pm

In Python using pandas, you can achieve this by using the `loc` function to subset the dataframe based on the specified conditions and then update the ‘operation code’ column with the value 52. Here is a code snippet to accomplish this:

“`python
import pandas as pd

# Create a sample dataframe
data = {‘operation_description’: [‘credit’, ‘purchase’, ‘withdrawn’, ‘balance’],
‘operation_code’: [10, 20, 30, 40]}
df = pd.DataFrame(data)

# Set operation code to 52 if ‘credit’, ‘debit’, or ‘withdrawn’ is present in operation_description
df.loc[df[‘operation_description’].isin([‘credit’, ‘debit’, ‘withdrawn’]), ‘operation_code’] = 52

# Display the updated dataframe
print(df)
“`

This code snippet first creates a sample dataframe with ‘operation_description’ and ‘operation_code’ columns. It then uses the `loc` function to locate the rows where the ‘operation_description’ is ‘credit’, ‘debit’, or ‘withdrawn’, and sets the ‘operation_code’ to 52 for those rows. Finally, the updated dataframe is displayed.

Reply
Evgenii says:

April 2, 2024 at 12:52 pm

“`python
import pandas as pd

# Create a sample DataFrame
data = {‘operation_description’: [‘credit card payment’, ‘withdrawn from atm’, ‘debit transaction’, ‘transfer to savings’]}
df = pd.DataFrame(data)

# Define the list of words to check for in operation_description
words_to_check = [‘credit’, ‘debit’, ‘withdrawn’]

# Set the operation code to 52 if any of the words_to_check is present in the operation_description column
df[‘operation_code’] = df[‘operation_description’].apply(lambda x: 52 if any(word in x.lower() for word in words_to_check) else None)

print(df)

Reply
- V says:
  
  April 22, 2024 at 6:12 pm
  
  Задание на Python ML developer.
  Направление анализа и распознавания аудио.
  
  Какой опыт коммерческой разработки на позиции ML?
  У вас есть опыт разработки своих моделей нейронных сетей или применения, существующих open source решений? Если да, опишите проекты или задачи, в которых вы применяли нейронные сети.
  С какими архитектурами нейронных сетей знакомы?
  Какой язык программирования вы используете в своей работе?
  Какие инструменты/фреймворки применяете и для каких задач?
  С какими библиотеками приходится работать?
  Работали ли ранее с SQL, (чистые запросы, ORM), какие ORM использовали? Какие СУБД использовали в работе (Sqlite, Postgresql, MySQL, MongoDb, ClickHouse, и др.)? Каков максимальный размер БД с которой был опыт работы?
  С какими сложностями приходилось сталкиваться при разработке и применении нейронных сетей?
  Опишите пайплайн обучения модели.
  Ссылка на дополнительное задание по направлению https://gitlab.zvonok.in/-/snippets/30.
  
  Reply

36 thoughts on “Публикации / Publications”

Leave a Reply to Evgenii Cancel reply